Spoken English Learner Corpora
نویسندگان
چکیده
In this paper we present a survey of some most significant spoken English learner corpora created up to date. Spoken learner corpora which include speech generated by learners are important in many areas of research and practice, in particular, for identifying typical pronunciation errors of learners of English as a second language (ESL), English as a foreign language (EFL), or English as a lingua franca (ELF). The data on common errors is helpful in designing more effective methods of pronunciation teaching as an aspect of language training. Also, error patterns can be implemented in intelligent tutor systems for English learning in order to design explanations and exercises in the error-preventive way and to generate a relevant feedback to the learner. The corpora we survey in this article include various types of English speech generated by learners with Arabic, Chinese, French, German, Greek, Japanese, Korean, Norwegian, Polish, Spanish, among others, as their first language (L1). Some English learner corpora described here are created for a single L1, other corpora are compiled for various first languages. Also, learner corpora vary depending on what type of English they exhibit: ESL, EFL, ELF or their combinations.
منابع مشابه
Compiling a Corpus of Taiwanese Students' Spoken English
This paper reports the compilation of a corpus of Taiwanese students’ spoken English, which is one of the twenty subcorpora of the Louvain International Database of Spoken English Interlanguage (LINDSEI) (Gilquin et al., 2010). LINDSEI is one of the largest corpora of learner speech. The compilation process follows the design criteria of LINDSEI so as to ensure comparability across sub-corpora....
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کاملPhrase Structure Annotation and Parsing for Learner English
There has been almost no work on phrase structure annotation and parsing specially designed for learner English despite the fact that they are useful for representing the structural characteristics of learner English. To address this problem, in this paper, we first propose a phrase structure annotation scheme for learner English and annotate two different learner corpora using it. Second, we s...
متن کاملFacilitating a description of intercultural conversations: the Hong Kong Corpus of Conversational English
The relative difficulty with which spoken corpora can be compiled by the researcher compared with written discourses, coupled with the time needed to fully transcribe spoken data, to say nothing of the additional expenses involved, inevitably has made large spoken corpora a far rarer entity than written corpora. And yet, if we are to further unravel the intricacies of spoken discourse, we need ...
متن کاملError Annotation for Corpus of Japanese Learner English
In this paper, we discuss how error annotation for learner corpora should be done by explaining the state of the art of error tagging schemes in learner corpus research. Several learner corpora, including the NICT JLE (Japanese Learner English) Corpus that we have compiled are annotated with error tagsets designed by categorizing “likely” errors implied from the existing canonical grammar rules...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Research in Computing Science
دوره 130 شماره
صفحات -
تاریخ انتشار 2016